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Abstract — A variety of mathematical tools have been developed 
for predicting the spreading patterns in a number of varied 
environments including infectious diseases, computer viruses, 
and urgent messages broadcast to mobile agents (e.g., humans, 
vehicles, and mobile devices). These tools have mainly focused 
on estimating the average time for the spread to reach a fraction 
(e.g., a) of the agents, i.e., the so-called average completion 
time E(T a ). We claim that providing probabilistic guarantee on 
the time for the spread T a rather than only its average gives 
a much better understanding of the spread, and hence could 
be used to design improved methods to prevent epidemics or 
devise accelerated methods for distributing data. To demonstrate 
the benefits, we introduce a new metric G a ,p that denotes the 
time required to guarantee a completion with probability /3, and 
develop a new framework to characterize the distribution of T a 
for various spread parameters such as number of seeds, level of 
contact rates, and heterogeneity in contact rates. We apply our 
technique to an experimental mobility trace of taxies in Shanghai 
and show that our framework enables us to allocate resources 
(i.e., to control spread parameters) for acceleration of spread in 
a far more efficient way than the state-of-the-art. 

I. Introduction 

Spreading patterns of pandemics [1], computer viruses [2], 
and information [3], [4] have been widely studied in various 
research disciplines including epidemics, biology, physics, 
sociology, and computer networks. In these disciplines, most 
studies have been devoted to characterizing spread behaviors 
toward a network of mobile agents including humans, vehicles, 
and mobile devices 1 over time. These studies can be classified 
into two groups based on their objectives. Interestingly, both 
these objects lie in opposite directions: slowing down or accel- 
eration of spread. For the research that deals with biological 
and electronic viruses, how to slow down the spread has been 
the most important question to be answered. On the other hand, 
another set of research work for computer data and information 
distribution has pursued designing engineering methods to 
accelerate the spread. 

Whatever the goals are, existing studies have relied on 
common mathematical frameworks such as the branching 
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process, mean-field approximation, and stochastic differential 
equations [5]. Due to the characteristics of these frameworks, 
the spread of virus or information has generally been analyzed 
in terms of its average behavior under various epidemic models 
summarized in [6], where epidemic models define whether 
agents are recoverable 2 or not and whether they become 
immune after recovery or are still susceptible to infection. 
Here, average behavior typically indicates E[N t ] where N t 
denotes the number of infected nodes in the network at time t. 

Average analysis gives an answer to a question on how many 
nodes are infected (or informed) on average under a specific 
epidemic model after a time duration t from the emergence of a 
virus (or generation of information). There have been many ex- 
tensions to this analysis through aforementioned frameworks. 
The authors of [7] identified how much a network topology 
affects the speed of virus spreading and the authors of [8] 
derived a closed form equation of the critical level of virus 
infection rate allowing a virus to persist in a network when 
the virus is recoverable with a certain rate. More realistic 
average spread behaviors of a virus with the heterogeneity 
inherent in human mobility patterns have been studied through 
simulations in [9]. In computer networks, [10] analyzed the 
average propagation behavior of code red worm in the Internet 
using measurement data from ISP and an epidemic model. 
[3] applied understanding on the average behavior of virus 
spread to information propagation in delay tolerant networks. 
Similarly, [2] analyzed the average spread behaviors of self- 
propagating worms on the Internet using branching process. 

While there has been a plethora of work on average analysis, 
the problem of allocating optimal amounts of resource to a 
network of a set of nodes for slowing down or accelerating 
spread has been under-explored. 3 Specifically, higher order 
spread behaviors over time rather than average behaviors 
to design optimal resource allocation have not been well 
understood. The right question should be what will be the 
distribution of the number of infected nodes at time t, which 
is equivalent to what will be the temporal distribution of the 
event that n nodes are infected. Characterizing the temporal 
distribution of spread allows one to guarantee the time for 
spread with high probability and it leads to control knobs 



2 A virus that cannot be recovered can be considered to be identical to 
undeletable or unforgettable information. 

3 [11] studied the optimal allocation of wireless channels of a carrier to 
mobile nodes in a content delivery network, which maximizes the sum utility 
defined with the content delivery time to the nodes. In the work, a bound 
on the content delivery time was studied, but its exact distribution was left 
unsolved. 
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for allocating resources to a network with its own purpose 
of spread. However, understanding the temporal distribution 
involves non-trivial challenges since there is a huge dimension 
of diversity in contact events among nodes in a network. 

In order to address the challenges involved in obtaining 
deeper understanding of resource allocation, in this paper, 
we propose a new analytical framework based on CTMC 
(continuous time Markov chain), which allows us to fully 
characterize the temporal aspect of spread behaviors. For sim- 
plicity, we put our emphasis on information distribution among 
intermittently meeting mobile nodes forming an opportunistic 
network, i.e., a mobile social network, but our results are easily 
applicable to general spread of epidemics. Our framework is 
capable of answering many intriguing engineering questions 
such as "what is the distribution of time for a network to 
have 75% penetration rate?" and "If 75% penetration is aimed, 
when is the time to guarantee that level of penetration with 
99% of confidence?". It can also answer a more fundamental 
question involving heterogeneity of nodes in a network, "Does 
heterogeneity help or hurt spreading?" We show the efficacy of 
our solution in answering these questions with the use of one 
of the largest experimental GPS (global positioning system) 
trace of taxies in Shanghai. Our simulation studies on the trace 
provide added verification that our framework is robust and 
enables us to engineer the network in a far more efficient way 
than existing understandings of spread. 

The rest of the paper is organized as follows. In Section II, 
we provide our system model along with definitions of relevant 
metrics. In Section III, we develop our analytical framework 
and present major analytical results. Based on our framework, 
we characterize the temporal distribution of spread behavior 
and provide their applications in Section IV. We present 
simulation studies using Shanghai taxi trace and conclude our 
paper in Sections V and VI, respectively. 



II. Model Description 

A. Overview of Epidemic Models 

In classic epidemiology, an individual (i.e., node) is clas- 
sified into either susceptible, infected, or removed (or im- 
mune) according to its status for a disease [5]. A susceptible 
individual refers to the one who is not infected yet, but is 
prone to be infected. An infected individual refers to the one 
who already got the disease and is capable of spreading it 
to susceptible individuals. A removed individual indicates the 
one who was previously infected but became immune to the 
disease. These three classifications are conventionally denoted 
by S, I, and R, respectively, and induce SIS, SIR, and SI 
epidemic models and their variants. In this paper, we focus 
on the SI model in which once a susceptible individual is 
infected, it stays infected for the remainder of the epidemic 
process. The SI model fits particularly well with information 
spread in opportunistic networks since once a data is delivered 
to an individual, it is considered that the data is delivered to 
its upper layer and it is no longer required (i.e., permanently 
infected). 



B. Our System Model 

We consider a network (or a population) consisting of N 
mobile nodes. We assume that all nodes in the network can 
be classified into K different types according to their mobility 
patterns and epidemic attributes. We denote the collection of 
the fcth type of nodes as group k (k = 1, . . . , K). Let Nk be 
the number of nodes in group k and denote N = (Nk)i<k<K- 
Then, we have \N\ = J^k^k = N (throughout this paper, 
we use a bold font symbol for an arbitrary vector or a matrix 
notation. In addition, for a vector V = (Vk), we define the 
operation |V| as |V| = J2k ^k)- 

In our model, the mechanism of information (or a packet 
or a virus) spread is as follows: initially, the information 
is delivered to some selected nodes, which we call seeds. 4 
Whenever a seed, say node a, meets a susceptible node 
not having the information yet, it spreads the information 
to the susceptible node with probability tp a 6 (0, 1]. Then, 
the susceptible node, say node b, receives the information 
successfully with probability ipt, G (0, 1] and becomes infected 
(or informed). Once the susceptible node becomes infected, 
it stays infected for the remainder of the spreading process, 
and is involved in disseminating the information in a similar 
manner as the seed. The spreading process ends when all 
nodes in the network obtain the information. In our spreading 
model, the probabilities ip a and fa can be interpreted as the 
infectivity and the susceptibility of nodes a and b, respectively. 
For instance, in the case of rumor propagation, ip a quantifies 
the tendency of person a to gossip, while fa quantifies 
the receptive nature of a listener b to the rumor. For the 
case of packet dissemination in an opportunistic network, ip a 
represents the probability that node a schedules to transmit a 
packet, and fa represents the probability of successful packet 
reception at node b, which depends on, e.g., the contact period, 
number of contending nodes, and channel conditions. 

The stochastic characteristic of a pairwise meeting process 
is a critical factor that determines the temporal behavior of 
the spreading process. In the literature, it has been recently 
shown that the time duration between two consecutive contacts 
of a pair of nodes, called pairwise inter-contact time, can 
be modeled by an exponential random variable, e.g., [12]- 
[14]. In [12], exponential inter-contact patterns are validated 
experimentally using three different mobility data sets. When 
nodes follow Levy flight mobility, which is known to closely 
mimic human mobility patterns [15], the authors in [14] math- 
ematically proved that the inter-contact time distribution is 
bounded by an exponential distribution. Thus, in this paper we 
assume that the pairwise inter-contact time between nodes a 
and b, denoted by M a ^, follows an exponential distribution 
with rate X a j, (> 0), i.e., 

P{M a , b >t} = cxp(-A a . b i) , t>0. (1) 

Suppose that node a is infected and node b is susceptible. From 
the meeting process between them, we can obtain the infection 
time by taking the infectivity cp a and the susceptibility fa 

4 Note that being selected as seeds can be of willing or unwilling. For 
instance, a seed of a virus gets the virus unwillingly. 
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into account. From (1), we have: 

P{Af$ > t) = exp(-A a , 6 ^&*) , t > 0. (2) 

That is, the infection rate A^ 6 between an infected node a 
and a susceptible node b becomes Ajj ff fa = Xa^a^b- The 
detailed proof of (2) is given in Appendix A. Since all nodes 
in the same group are stochastically identical in terms of 
mobility pattern and epidemic attribute, the rate X e ^ b should be 
determined by the group indices of nodes a and b. Thus, we 
can rewrite the infection rate as A^\ = At, > wis, where the 
subscripts J- (a) and F(b) denote the group indices of node a 
and b, respectively. For later use, we define a rate matrix A 
as 

A A (\* \ 

K^kiMJ l<k!,k 2 <K' 

Our spreading model is general in that it covers a variety of 
scenarios from homogeneous to totally heterogeneous cases. 
For instance, when K = 1 our model reduces to the homoge- 
neous case where all nodes in the network are identical with 
the same infection rate A^-^ _wy = A* ^ (= A*) for any a, b. 
On the other hand, when K = N our model induces a totally 
heterogeneous case where each individual node uniquely forms 
a group. When K = 2, . . . , N— 1, our model is able to capture 
heterogeneity arising from multiple communities. In addition 
to heterogeneity, our model is capable of characterizing the 
impact of various spread parameters such as the level of 
contact rates and population size by varying the values of the 
rate matrix A and group sizes TV. 



a fraction of the total population with probability f3, denoted 
by G ai p, is defined by: 

G a>p 4 inf {t : P{T a >t}< 1- /?}. (4) 

We call G a .p the (a, /?) -guaranteed time throughout this paper. 

Note that the probability 1 — (3 in (4) can be interpreted as 
an outage probability that the actual spread time T a exceeds 
the guaranteed time G Q ( g. In this sense, G a ^ can be used 
to predict the range of spread time and the confidence of the 
prediction: the higher we set the value of /3, the greater the 
confidence in the prediction. Thus, G a $ facilitates avoiding 
underestimating the required resources for spreading informa- 
tion to a network. The ratio R a ^ defined below describes just 
how much E[T a ] underestimates the spread time compared to 
the guaranteed time. 

Definition 3 ((a, /3)-average to guaranteed time ratio). For 

a G (0, 1] and f3 G (0, 1), the ratio R a ,p is defined by 

- ■ (5) 

We call R a ,p the (a, /3)-average to guaranteed time ratio 
throughout this paper. 

Finally, we define the set of seeds in each group. Let Sk — 
ife(0) be the number of seeds in group k. If J2k Sk — tnen 
we have a trivial result of T a = 0, G a ,p = 0, and R a ./3 = 1 
for any (3 G (0, 1). Therefore, in the rest of the paper, we only 
consider the regime of ^ fc Sk < aN . 



C. Performance Metrics 

In this subsection, we describe our performance metrics in 
detail. Let Sk(t) be the number of susceptible nodes in group k 
at time t (> 0). Let Ik(t) be the number of infected nodes in 
group k at time t. Then, we have Sk{t) + Ik(t) = Nk for 
all k and t. The key performance metric of our interest is the 
a-completion time as defined below: 

Definition 1 (a-completion time). For a G (0, 1], the time 
required for infecting a fraction of the total population, 
denoted by T a , is given by: 

T a 4 in f|i:^4(i) >aiv|. (3) 

^ k=l ' 

We call T a the a-completion time throughout this paper. 

T a has a strong connection with existing studies that have 
characterized the average number of infected nodes at time t 
(i.e., J^k E[-ffc(i)]) using various mathematical tools, because 
E[T a ] is a dual of J2k E[/fc(i)]- However, to better understand 
the spread behavior and to better design spread prevention or 
acceleration methods, characterization of the distribution of T a 
beyond simply the mean is needed. To this end, we introduce 
a new metric, called (a, /?) -guaranteed time, as defined next: 

Definition 2 ((a, /3) -guaranteed time). For a G (0,1] and 
j3 G (0, 1), the minimum time required to guarantee spread to 



III. Basic Temporal Analysis Framework 

In this section, we develop an analytical framework for 
deriving the performance metrics defined in (3), (4), and (5). 
We use the following three steps in our analysis: first, we 
identify the temporal behavior of the total number of infected 
nodes {J2k Ik(t);t > 0} (See Lemma 1). Using the result 
in Lemma 1, we are able to obtain the distribution of the a- 
completion time T a (See Lemma 2). Finally, we give formulas 
for our performance metrics (See Lemma 3). 

Step 1: According to Definition 1, we need the temporal 
distribution of the total number of infected nodes X)fc^fc(^)- 
Directly solving it appears to be intractable (as illustrated in 
the following Example II for the case K = 2). However, 
we prove that the joint temporal distribution of Ik(t) can be 
derived from the theory of multi-dimensional CTMC, which 
could be also used to identify the distribution of J2k^k(t). 
The result is summarized in Lemma 1 and its derivation is 
explained through the following Examples I and II. 

Lemma 1 (CTMC model). For K = 1, 2, . . ., let 

I(t)±(h(t),I 2 (t),...,I K (t)). 

Then, the process {I(t);t > 0} is a K -dimensional CTMC. 
Further, it has the following properties: 

(PI) The state space is given by £ = rifc=i{0' • • • > ^fc} \ 
and is decomposed into transient state space £* and 



absorbing state space £° as: 

£* = {ee£ :\e\< \N\}, 
£°±{N=(N 1 ,...,N K )}. 

Without loss of generality, we assume that the states in 
£ = {e\, e%, . . .} are arranged as \e±\ < |e2 1 < • • •■ 
(P2) By the property (PI), the infinitesimal generator Q of 
the Markov chain is of the following form: 



Q 



F F c 




where F = (Fij ) is a matrix representing transition rate 
from £* to £*, and F° is a column vector representing 
transition rate from £* to £°. Due to its importance, we 
call the matrix F the fundamental matrix. 
(P3) Assume P{I(0) G £*} = 1. For a given time t > 0, let 
ir(t) = (P{I(t) = e}) ee £* be the distribution of I(t) 
on £*. Then, it is determined from its initial distribution 
tt(0) and the fundamental matrix F as [16]: 

ir(t) = 7r(0)exp(Ft). 

The distribution of I(t) on £° is then obtained by 
P{I(t)=N} = l-\n(t)\. 
(P4) Let the ith and the jth states in £ be denoted by 
e-i = (ik)i<k<K and e 3 = (j k )i<k<K, respectively. 
Then, Q = (Qij) is obtained as: 

q. , = \Y,tI(e i ,e J ):t{Nl "iOEife^, if £ 7^ j, 
I _ 2~2l 



if i = j, 



where I, 



€ {0, 1} and takes 1 if and only if ji 



ii + 1 and jk = £fc f or a H k ^ I. Then, we can obtain F 
by restricting Q to the space £* as F = Q|£* X £*> i.e., 
F = (Qi j) for all i,j such that e$, ej G £*. 

Proof: See Appendix B. ■ 

Example I. (Homogeneous model, Single community model) 
We start our analysis with the simplest case of K = 1 
(i.e., homogeneous model), and drop the group index in all 
notations for simplicity. In this case, we have I(t) = I(t) and 
£ = {1, . . . , N}. Then, we can identify the temporal behavior 
of I(t) as follows: first note that the process {I(t);t > 0} 
is a counting process in that it counts the number of events 
that have taken place during (0,t]. Hence, state transitions 
occur only to the adjacent state from £ (= 1, . . . , N — 1) to 
i + 1, and then eventually the system is absorbed to state N. 
Thus, the state space £ is decomposed into transient state space 
£ * = {1, . . . , N — 1} and absorbing state space £° = {N}. 
Suppose that the system enters state i € £* at time to. Let Xi 
be the sojourn time of state i. Note that the sojourn time is 
equivalent to the time to have one more infected node, which 
is the same as the minimum infection time from i number of 
infected nodes to N — i number of susceptible nodes, i.e., 

X t = min {Mf b ; a £ l(t ), b £ S(t„)}, 






Fig. 1. Transition diagram of the Markov chain {I(t); t > 0} when K = 1, 





Fig. 2. Transition diagram of the Markov chain {(Ii(t), l2(t))\ t > 0} when 
K = 2. The rate from (11,12) to (ii + l,i 2 ) is (JVi — ii) £)fc=l ik^t v 
and the rate from (ii, 12) to (ii, 12 + 1) is (A^2 — 12) X]fc=i ^k^X 2' 



Exp(A*) from (2) and is independent for all nodes, we have: 

X, ~ExpO'(iV-i)A*). 

Therefore, the process {I(t); t > 0} is a CTMC with transition 
diagram depicted in Fig. 1. From the transition diagram, we 
can easily obtain the matrix F. For details, see Appendix C. 

Example II. (Double community model) We next consider the 
case when K = 2. In this case, if we set the state variable 
as the total number of infected nodes (i.e., Ikifj), then it 
becomes intractable to identify the statistics of sojourn time Xi 
of state i, unless we know how the overall infected nodes in 
the network are distributed to each group. For this reason, we 
set the vector (Ii(t),l2(t)) as the state variable. Suppose that 
at time to, the system enters state (11,12). Since the process 
{Ii(t) + l2(t);t > 0} is a counting process, the very next 
state transitions occur only to either (i\ + 1, 12) or i% + 1), 
and then eventually the system is absorbed to state (Ni, N2). 
Hence, state space £ = {(0,1), (1,0), (1,1), . . . , (N u N 2 )} is 
decomposed into transient state space £* = {e G £ : \e\ < 
Ni + N 2 } and absorbing state space £° = {(N 1 ,N 2 )}. For 
(£1, 12) G £*, let X( il i ^. il and X( il io y A ., be the time required 
to infect one additional node in groups 1 and 2, respectively. 
Then, by a similar reason as in Example I, we have 



X 



(ii,i 2 ):ii 



{M^ b ; a e Ii(to) U X 2 (to), b G 5i(*o)}, 



where Ik(t) and Sk(t) (k = 1,2) denote index sets of 
infected nodes and susceptible nodes in group k at time t, 
respectively. Thus, X^ l i ^y il follows an exponential distribu- 
tion as in Example I, but in this case the rate is given by 
£i(iV x - ixJAJ,! + i 2 (N 1 - £ 1 )A| il = (JV X - ii) ELi i*K,v 
Similarly, we have that Xt il i2 y A2 follows an exponential 
distribution as summarized below: 



X 



{it ,i 2 )-a 



Exp((iVi -ii)ELi^fc,i) ifa = £i, 
Exp ((JV 2 - £2) ELi ^fc.a) if a = £ 2 - 



(6) 



where X(t) and S(t) denote index sets of infected nodes Note that the sojourn time X( iui2 ) of state (£^£2) is the 



and susceptible nodes at time t, respectively. Since \ ~ minimum value between X 



(ii,i 2 ):ii 



and X 



(ii,i 2 ):i2' 



Hence, 
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Fig. 3. An example of £ a (= £*US°) for (Ni,N 2 ) = (3, 5), \aN~\ = 4: 
shaded states form £°, and the others form £*. 



from (6), Xu u i 2 \ follows an exponential distribution. There- 
fore, the process {(I\(t), hit))', t > 0} is a 2-dimensional 
CTMC with transition diagram depicted in Fig. 2. From 
the transition diagram, we can easily obtain the fundamental 
matrix F. For details, see Appendix C. 

Step 2: Using the results in Lemma 1, we can derive the 
distribution of T a . We take two steps: (i) first, we truncate 
the state space £ to £ a = {e 6 £ : \e\ < [aiV]}, where \x] 
denotes the smallest integer greater than or equal to x. (ii) 
Next, we split the truncated state space £ a into transient state 
space £* and absorbing state space £ ° as: 

£*^{ee£ a : \e\ < \aN]}, 
£°±{eee a :\e\ = \aN]}. 

On the state space £ *U £ °, we define a truncated process I a (t) 
from the process I(t) as follows: I a (t) evolves according to 
I(t) unless I(t) £ £°. When I{t) enters one of states in 
£°, say e, truncation happens and I a (t) is absorbed to the 
state e. Then, by Lemma 1 the process {I a (t);t > 0} is a 
Jf-dimensional CTMC with possibly multiple absorbing states 
in £ °. Moreover, by Definition 1, T a is the time taken by the 
truncated Markov chain to be absorbed into £ °. An example 
of transition diagram is shown in Fig. 3. 

Similarly to (P2) in Lemma 1, the infinitesimal genera- 
tor Q a of the process {I a (t); t > 0} is of the following form: 



Qa — 



F a F£ 




Here, F a is a matrix representing transition rate from £* to 
£*, and can be obtained from the fundamental matrix F of 
the original process {I(t);t > 0} by 



(= Q\s*x£* 



(7) 



Similarly, F° is a matrix representing transition rate from £* 
to £°, and is obtained by F° = Q\s^x£°- Therefore, the 
value a determines where to truncate the matrix F or Q in 
Lemma 1 and how to redefine transient and absorbing state 
spaces. For a values satisfying \aN~\ — N, we have £* = £* 
and £ ° = £°, which gives F a = F and F° = F° . 

Once we have the truncated fundamental matrix F a from 
the original matrix F, we can obtain the distribution of T a as 
in the following lemma. 

Lemma 2 (Distribution of T a ). The cumulative distribution 
function ( CDF) of the a-completion time is given by 

H a (t) 4 P{T a <t} = l-h Q exp(F a t)l, 



where h a = (P{J a (0) = e}) e6 £* is a row vector denoting 
the initial distribution, F a is given in (7), and 1 is a column 
vector of ones. In addition, it can be expressed as in the 
following form [17]: 



H a (t) = 1 



exp(-\ Pi \t)Pi(t), 



where pi denotes the ith eigenvalue of F a with multiplicity 
denoted by mi, and Pi(t) is a (m, — l)th order polynomial 
function of t. Since F a is an upper triangular matrix, the 
eigenvalues are from the distinct diagonal elements of the 
matrix, which are all real and negative. 

Proof: See Appendix D. ■ 

Step 3: Based on Lemma 2, we can derive formulas for our 
performance metrics, as shown in Lemma 3. 

Lemma 3 (Formulas for G a .p and R a .p)- The inverse function 
of the distribution function H a {-) in Lemma 2 exists and yields 
the (a, j3)-guaranteed time in (4) as 

H-\(3). 



G 



a,/3 



(8) 



The fundamental matrix F a is invertible, and its inverse matrix 
gives the nth moment of T a (n = 1, 2, . . .) as 



E[(T a 



n\h a {-F a 



(9) 



Therefore, the ratio R a ^ is obtained from (8) and (9) by 

H~\P) 
Ka 'P ~ h a (-F a )-ii 

Proof: See Appendix E. ■ 

Major applications leveraging G a ^{= G a ,p{A.,s)) include 
the followings: 

1) For distributing a firmware or a software update to 
smartphones (and tablets) through opportunistic contacts 
among nodes when cellular network carriers wish to avoid 
abusing network resources while guaranteeing the time 
to deliver the update with more than 99% of confidence, 
G a p becomes significantly useful to determine the re- 
quired number of seeds in the network. For instance, 
to guarantee delivery with probability (3 for a fraction 
of nodes within time Tb oun d, the number of seeds s 
who directly get the update from the carriers can be 
determined from: 

s = ^ (A, Tb oun d). 

2) For an autonomous disaster broadcasting system, which 
purely leverages opportunistic contacts without relying on 
network infrastructures, the target level of infection rates 
A, which achieves a desirable time bound Tbound, can be 
determined by: 

A = ^ (Tbound 7 s ) 

for given (a, (3). Based on this prediction, we can scale up 
or down the infection rates A among nodes by optimally 
controlling the communication ranges of mobile devices. 

3) For a highly contagious disease emerged at a city, if 
medical facilities in the city have capacity for up to a 
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portion of citizens who typically have A infection rates, 
the regional government can estimate the allowed time to 
execute emergency plans by referring to: 

Abound = G a> p(A, s). 

IV. Analytical Characteristics and Applications 

In this section, we present analytical characteristics derived 
from our framework, and provide how to utilize these charac- 
teristics in practical applications. 

A. Impact of the level of infection rates 

The behavior of information spread is determined by various 
spreading factors. Using our framework, we first answer the 
question on how the level of infection rates A^ ff b affect the 
distribution of a-completion time. 

Theorem 1 (Impact of the level of infection rates). Suppose 
that the infection rate \ e f b is scaled by 7 (> 0) times for all 
a,b. Let T a , G a> p, and R a ,p be the correspondences of T a , 
G a ,p, and R a ,p after the scale, respectively. Then, for any 
a 6 (0, 1], we have 

f a = 7 - 1 T a , (10) 

where = denotes "equal in distribution." The relationship 
in (10) yields for any a 6 (0, 1] and /3 £ (0, 1) the fallowings: 

G a ,p = 7 1 G Q , / 3, 
E[(t a ) n ] = 7 -"£[(T a )"]. 

Hence, R a ^ becomes R a ^ = R a ,@- 

Proof: See Appendix F. ■ 

The result in Theorem 1 shows that the spread becomes 
faster proportionally to the level of infection rates in distri- 
bution sense. Similarly, we show that the average Ai(t) = 
Z) fc E[7 fe (i)] and its time derivative V(t) = f t M(t) scale 
respectively as M(t) = M(^t) and V(t) = jV^t) for all 
t > 0. The detailed proof is given in Appendix G. 

B. Impact of population size 

We next characterize the impact of population size on 
information spread. In our epidemic model, each non-seed 
node can be considered as a workload to finish. However, once 
the node becomes infected, it works in a similar manner as the 
seed and is involved in spreading the information. Hence, it is 
not straightforward whether the population size accelerates or 
slows down the speed of information spread. Our framework 
gives the answer, as shown in Theorem 2. 

Theorem 2 (Impact of population size). Suppose a = 1 (i.e., 
spread completion), K = 1 (i.e., homogeneous model), and 
S\ = 1 (i.e., one seed). As the population size N increases, 
we have 

• G aj /3 is strictly decreasing for sufficiently large /?. 

• £[T Q ] is strictly decreasing. 

In addition, it scales respectively as 



. G a , p = 6((A*)- 1 A- 1 (log N - log(logi))). 
. E[T a ] = 6((A*)- 1 A- 1 logA). 
Hence, i? Qj( 3 scales as 6(1). 

Proof: See Appendix H. ■ 

The results in Theorem 2 indicate that adding a node in 
the system accelerates the information spread when per-pair 
infection rates are unchanged. 

Remark 1. To assist understanding of Theorem 2, we consider 
a non-cooperative spread model, where seed chosen at the 
beginning only spreads the information. 5 As the population 
size N increases, we have 

• G a .p is strictly increasing for sufficiently large f3. 

• £[T Q ] is strictly increasing. 

In addition, it scales respectively as 

• G a4i = 6((A*r 1 (log N - log(log i))) . 
. E[T a ] =6((A*)- 1 logA). 

Hence, R a jj scales as 6(1). 

More properties of our model (namely, cooperative model) 
and the non-cooperative are compared in the following table: 





Cooperative model 


Non-cooperative model 


Variance 
ofT a 


Strictly decrease with N 
and scale as 0((A*A r )~ 2 ) 


Strictly increase with N and 
converge to (A*)~ 2 C(2) 


Skewness 
of T a 


Strictly decrease with N 
and scale as 0((A*A r )~ 3 ) 


Strictly increase with N and 
converge to (A*) - 3 f (3) 




E[(T a ) n J < 00 


K[(T a ) n l = oo 



In the table, ((c) = 2~2^Li n ~ c denotes the Riemann zeta 
function. The proof for the results in Remark 1 is given 
in Appendix I. Our analysis showing that G a p behaves 
differently for the scaling of N and A* tells that resource 
allocation for information spread should be carefully designed 
based on the willingness of cooperation in a spread process 
(i.e., infectivity in a spread process). 

C. Impact of multiple community 

The impact of heterogeneity in information or virus spread- 
ing has been less explored. Using our CTMC -based frame- 
work, we analyze and understand the temporal spread behavior 
under a heterogeneous network with multiple groups com- 
pared with a homogeneous network. In particular, we focus 
on answering "Does heterogeneity persistently expedite the 
spreading or not?", "Is there an optimal heterogeneity level 
for information spread?", and "Is there an upper or a lower 
bound on the gain from the heterogeneity over homogeneity?". 

In this subsection, we provide the answers to these questions 
by studying dual community model (K — 2). Note that our 
framework can be easily extended to study the cases when 
K > 3. In order to focus on heterogeneity arising from 
multiple community, we make assumptions as follows: (i) two 
groups are of the same size N\ = N2(= N/2), (ii) The inter- 
group infection rates are the same for both directions, i.e., 
A* 2 = ^2 i (iii) There is one seed. Without loss of generality, 
the seed is chosen arbitrarily from group 1. 

5 In epidemiological term, this non-cooperative model is classified as a SIR 
model with zero recovery time from infection. 
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Fig. 4. Comparison of (a, /3)-guaranteed time G a a with the homogeneous 
model for fS = 0.9 and a = 0.3,0.5,0.7,1.0: if (71,72) £ r a>/3 , then 
heterogeneity in multiple community accelerates the information spread (i.e., 
reduces the guaranteed time G a< p). If (71,72) ^ ^a,fi, then heterogeneity 
slows down the information spread. 



Let 71 = A* 2 anc l 72 — ^2 2/^1 2- The values of 71 
and 72 control the intra-group infection rates, and are chosen 
freely in the range < 71, 72 < 00. Note that (71, 72) = (1,1) 
reduces to the homogeneous model and larger deviation from 
(1,1) induces more heterogeneity. For a fair comparison with 
a homogeneous model of size N and infection rate A*, we 
use the following constraint that represents the same average 
infection rate: 



A* = 



N(N - 1) 



(ID 



With the help of Theorems 1 and 2 showing the scaling of 
A* and N, we can characterize and generalize the impact of 
heterogeneity by only observing a specific setting of (A*, N). 
For simplicity, we choose (1, 40). We then vary (71, 72) in the 
range < 71, 72 < 20. From Lemma 3, we obtain the (a, (3)- 
guaranteed time G Qj( 3 and compare it with the homogeneous 
counterpart. Fig. 4 shows the result. In the figure, T a> p is 
the region such that if (71,72) £ r Q ( 9, then heterogeneity 
yields reduced guaranteed time G a .g, compared with the 
homogeneous model, and vice versa. Hence, the region T a ^ 
can be interpreted as the area where heterogeneity accelerates 
the information spread. From the figure, we can observe the 
followings: (i) as a increases, the region T a ,g shrinks. Hence, 
for a fixed (71,72), there exists a threshold such that 
(71,72) G r Q ,/3 if a < a t h and (71,72) £ T a ^p if a > a±. 
In addition, the threshold decreases as (71,72) deviates from 
(1,1). This implies that heterogeneity accelerates the spread 
at beginning phase (i.e., a < a t h) while slowing down the 
spread at ending phase (i.e., a > a t h), and the time portion of 
being accelerated shrinks with more heterogeneity, (ii) For any 
a e {0.3, 0.5, 0.7, 1.0}, there is a non-empty region f] a T a ^, 
where heterogeneity always accelerates the information spread 
(i.e., a t h = 1 for all a), (iii) In the region {(71, 72) : 71 < 72}, 
heterogeneity always slows down the information spread. That 
is, if the seed is chosen from a less infective group, then 
heterogeneity never accelerates the information spread. 

As a special case, we consider a system where the 
inter-group infection rate is determined from intra-group 
infection rates by AJ 2 = (A* 1 + A22V2, and the 
seed is chosen from more infective group. Let 7 = 
max{A^ 1} Aj 2} I min{A| lt A£ 2 }- F° r fixed (A*, AT) = 



(1,40) as above, we vary 7 as 7 = 1,2,4,8, and show the 
(a, /3) -guaranteed time G a>/ 8 in Fig. 5. From the figure, we 
confirm that heterogeneity indeed accelerates the spread for 
smaller penetration (i.e., for low a) but slows down it for 
higher penetration. This observation is proved in Theorem 3 . 

Theorem 3 (Impact of multiple community). Let 

lo gj P{T a ( 7 ) > t} 



-D« (7) = - 1™ 



t 



where T a {^) denotes the a-completion time when 7 is used. 
Then, D a (7) exists and satisfies the followings: 
• If a < 1 - f , then ^^(7) > Ofar all 7 > 1. 
. Ifa = l, then ^-D a (j) < for all 7 > 1. 
. 7/1 - £ < a < 1, then ^(7) > for 7 < 



N-4 



and ±D a { 1 )<Qfor 1 >^ r Ef. 
Proof: See Appendix J. ■ 

D. Contribution of each node to the information spread 

In this section, we provide a method for quantifying the 
contribution of each individual node to the information spread. 
The quantification can be useful, e.g., for cellular carriers 
in incentivizing a node who contributes to alleviate data 
deluge in cellular networks by distributing packets through 
opportunistic contacts among nodes. Such an evaluation tool 
is of importance especially when nodes have heterogeneous 
attributes in spreading the information. Let C, denote the 
degree of contribution of node i to the spread. In this work, 
we evaluate Cj by using the concept of the Shapely value [18], 
which is known as a good metric measuring the surplus (or 
the contribution) of a node in the cooperative game theory: 

G a ,p(Af\{i}) 



G a (N-l)/N,p{N) 



(12) 



where Af = {1, . . . , N} is the index set of nodes, and for an 
index set A, G a ,p(A) is the (a, /3) -guaranteed time for the 
network consisting of nodes i e A. Hence, the numerator and 
the denominator in (12) denote the guaranteed times in the 
network without and with the node i, respectively, and con- 
sequently a node i with high contribution to the information 
spread has a large G\ value. Due to page limitation, we omit 
detailed application of the metric Cj and its analysis. 

E. Applications 

How to optimally distribute given resources to nodes in a 
network to minimize the time for spreading of information 
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to the network is of an important research question. Our 
results presented in this section provide initial understanding 
to this question. Theorem 3 proves that when the number of 
nodes N increases, heterogeneity in A expedites the spread of 
information for most of the time except some time duration 
at the end of spread, where the duration converges to zero 
as N goes to infinity. It is important to point out that 
our understanding implies the existence of a small region 
of A with heterogeneous contact rates, which always make 
the spread faster than a network with homogeneous A. By 
applying these two observations to designing a network, we 
have the following applications: 

1) For a network delivering information to a community 
using vehicles or message ferries (e.g., DakNet [19], 
DieselNet [20], and ZebraNet [21]) of which total amount 
of fuel is given, the amount of fuel distributed to each 
vehicle can be asymmetric to guarantee faster spread of 
information all the time compared to symmetric distribu- 
tion. 

2) When the number of nodes in a network is extremely 
large (e.g., users in facebook), advertising a product to 
the network can be expedited by providing incentives to 
users to forward information to others in a highly skewed 
manner. Our results support that evenly distributed in- 
centives to the entire population would lead to much 
slower spreading compared to unfair incentives. This 
tells that the same speed of spread can be achieved by 
only providing a smaller amount of total incentive to the 
network when incentives are optimally distributed with 
the understanding of skewness. 

V. Simulation Study 

We study the efficacy of our framework and characteriza- 
tions using by far the largest vehicular mobility trace obtained 
from more than a thousand taxies in Shanghai, China [22]. 6 
The experimental trace tracked GPS coordinates of taxies 
at every 30 seconds during 28 days in Shanghai. The trace 
was analyzed in [23] and it was shown that the taxies have 
exponentially distributed pairwise inter-contact time, which is 
well aligned with our CTMC -based framework. 

Figs. 6 (a), (b), and (c) characterize the statistics of the taxi 
network with 1000 randomly chosen taxies in the aspect of 
number of contacts, number of neighbors in a communication 
range (50 meter in our analysis), and contact duration, respec- 
tively. We apply these three factors for evaluating the effective 
contact rates A^ ff 6 = Xafifa^b derived in 2, where <p a = 1 
and ipi, is 1 over average number of neighbors multiplied 
by the expected number of contacts to make a successful 
data transfer. Note that the latter is derived from the contact 
time distribution and the time required for a data transfer. 
The results for a homogeneous network (i.e., A*) and for a 
heterogeneous network with two groups (i.e., A* x , A£ 2 an d 
A^ 2 ) are summarized in Table I. Note that the infection rates 
in Table I satisfy the constraint in (11) that was introduced 
for a fair comparison between a homogeneous model and a 

6 Our framework is applicable to various networks including taxi networks. 
Due to the availability of data, we limit simulation study to a taxi network. 



heterogeneous model. Based on the statistics in Table I, we 
can predict the information spread time and examine possible 
methods to properly allocate resources for the taxi network. 

TABLE I 

Infection rates for a homogeneous network and for a 
heterogeneous network with two groups of taxies. 



Homogeneous Network 


Heterogeneous Network 


A* 




X 2,2 


K,2 (— A 2,l) 


4.14 ■ icr 4 


7.17 ■ icr 4 


1.93 ■ icr 4 


3.72 ■ icr 4 



Based on Table I, we simulate probabilistic guarantees for 
the completion time in a homogeneous and a heterogeneous 
network, each with 100 taxies. We assume a firmware update 
to be distributed for mobile devices, which will take around 
90 seconds demanding 1.15 number of contacts on average. 
The number of taxi is scaled down to 100 due to computation 
complexity involved in matrix operations. Figs. 7 (a), (b), 
and (c) show the (a, /3)-guaranteed time for a G [0, 1] and 
f3 e {0.5, 0.9, 0.99} with the number of seeds given by 1, 10, 
and 20, respectively. The figures tell that if we target 90% 
penetration with 99% confidence (i.e., (a,/3) = (0.9,0.99)), 
then the network with a single seed is estimated to take 
about 11.6 days (i.e., 278 hours) to achieve the target level 
of information spread. This estimation largely differs from 
the existing estimation of average time to achieve 90% of 
penetration, which is close to 7 days. This clarifies that 
designing plans associated with the successful spread to 90% 
of nodes should allow about 4.6 days more. If not, a set 
of planed work may not be executable on time. If shorter 
time duration needs to be guaranteed to avoid the plan being 
delayed, our framework is able to suggest to add seeds to 
the network as shown in Figs. 7 (b) and (c). As the number 
of seeds increases to 10 or 20, the time for 90% penetration 
with 99% confidence reduces from 278 hours to 137 and 113 
hours, respectively. These predictions guide how to optimally 
plan the information spread. 

Similarly, we can study a heterogeneous network with two 
groups. Figs. 7 (d), (e), and (f) show the (a, /3)-guaranteed 
time for a g [0, 1] and (3 € {0.5, 0.9, 0.99} with 1, 10, and 20 
seeds, respectively. Direct comparison between Figs. 7 (a), (b), 
(c) and Figs. 7 (d), (e), (f) confirms our claims from Theorem 3 
that the (a, (3) -guaranteed time in a heterogeneous network is 
faster for lower a, but is slower for higher a close to 1 . This 
implies that if it is mandatory to achieve 100% penetration, 
making the nodes in a network to be more homogeneous (by 
providing more resources to relatively inactive nodes) can be 
helpful, when increasing the level of average contact rates is 
not possible due to resource concern. 

VI. Conclusion 

In this paper, we characterize the probabilistic guarantee 
of the time for information spread in opportunistic networks 
by developing a CTMC-based analytical framework and in- 
troducing the metric G a> p. We also identify the temporal 
scaling behavior of information spread for a set of key spread 
factors. Through various examples of application scenarios and 
simulations over the Shanghai taxi trace, we show that our 
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Vehicle ID (sorted by its # ot contacts) Vehicle ID (sorted by its average # ot neighbors) x: contact duration (minutes) 

(a) Number of contacts for each vehicle (b) Average number of neighbors (c) CDF of aggregated contact durations 



Fig. 6. (a) Number of contacts of a vehicle with all other vehicles during 28 days, (b) Average number of neighbors when a node is in a contact with another 
node, (c) CDF of aggregated contact durations between all taxi pairs. 





Fig. 7. Distribution of the (a, /3)-guaranteed time for a € [0, 1] and /3 = {0.5, 0.90, 0.99} with (a) 1 seed, (b) 10 seeds, and (c) 20 seeds in a homogeneous 
network and with (d) 1 seed, (e) 10 seeds, and (f) 20 seeds in a heterogeneous network with two groups. 



framework enables us to estimate proper amount of resource 
to a network in information spread by providing the detailed 
statistics of the guaranteed time for given penetration targets. 
We believe our framework can be viewed as an important first 
step in the design of highly sophisticated acceleration methods 
for information spread (or prevention methods for epidemics). 



Appendix A 
Proof of Equation (2) 

For a given t > 0, let N ab (t) be the total number of contacts 
between nodes a and b by time t. Since {N a ^(t);t > 0} is a 
Poisson process with rate A a> 6, we have for n = 0, 1, . . .: 

P{N a , b (t) = n} = cxp(-A a , fc t)^^. (14) 

Since a contact between nodes a and b incurs infection with 
probability (p a ipb, we have for n = 0, 1, . . . the following: 

P{M^ > 1 1 N a>b {t) = n} = (1 - <p a ip b ) n . (15) 



Hence, from (14) and (15), V{M^ b > t} is obtained by: 

oo 

P{< 6 > t} = E P { M * > * I W«,i(t) = n}P{N a , b (t)=n} 

fx j-NX^ (( X ~ <Pa4>b)K.bt) n 

= exp(-Xa,bt) 2^ i 

n=0 

= exp(-A ajb £) exp((l - ip a tpb)K,bt) 
= exp(-tp a ip b \ a , b t). 

Appendix B 
Proof of Lemma 1 

The derivations in Examples I and II are easily extended 
to prove the main result, (PI), and (P2). The proof of (P3) is 
given in [16]. Hence, we omit the details and provide intuition 
of (P4). Suppose I(t) = e % . Since {J2k I k( t )', t > 0} is a 
counting process, state transitions occur to an adjacent state ej 
stratifying ^ fc \j k — i k \ = 1 and ji = ii + 1 for some I. In 
addition, the time required to transit to such state ej becomes 

min {Mf b ; a £ Xi(i) U . . .UX K (t), beS^t)}, (16) 

where Ik{t) and Sk{t)(k = 1, . . . , K) denote index sets of 
infected nodes and susceptible nodes in group k at time t, 
respectively. By (2) and the independence of M^ b , the random 
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variable in (16) follows an exponential distribution with rate 
(JV| - /;) V, This yields (P4). 

Appendix C 

Explicit Expression for Fundamental Matrix F 

In this appendix, we show explicitly the fundamental ma- 
trices F for K = 1, 2 in Examples I and II. Suppose K = 1. 
Then, from Fig. 1, we can obtain the matrix F as given at the 
bottom of the page. Suppose K = 2. Then, from Fig. 2, the 
matrix F is obtained as follows: for i = 0, 1, . . . , Ni, define 
matrices A% and Bi as 






Xi,0 













-Zi.l 































• ■ —Zi,N 2 - 


1 %i,N 2 -l 













—z 


i,N 2 


Vi,o 





. 















. 
















Vi,2 ■ 








: 








. 


• Ui,N 2 -l 







_ 





. 





Vi,N 2 _ 





where the components Xi,j,yi,j, and defined as 

Xi,j=i(N 2 -j)Xl 2 +j(N 2 -j)Xl 2 , 
y hJ ^i(N 1 -i)\l 1 + J (N 1 -i)\l 1 , 
— J- 

Then, the fundamental matrix F is given by 



F = 



~A 


Bo 














A, 


B 1 . 














A 2 . 



















Bn 1 


. 











An 



where Ao is obtained by eliminating the first row and the first 
column of Ao, and Bo is obtained by eliminating the first row 
of Bo (here, the elimination is for excluding state transition 
from (0,0) or to (0,0)). An 1 is obtained by eliminating the 
last row and the last column of A^, and Bn 1 -i is obtained 
by eliminating the last column of JBjvi-i (here, the elimination 
is for excluding state transition from (N\, N 2 ) or to (Ni,N 2 )). 



Appendix D 
Proof of Lemma 2 

Since the event {T a > t) is equivalent to {I a (t) £ £*}, 
we have 

P{T a >t} = V{I a (t)G£*}. (17) 

Let TT a (t) = (P{I a (t) = e}) ee £* be the distribution of I a {t) 
on £*. Then, by the same reason in (P3) of Lemma 1, we 
have 

TT a (t) = 7T Q (0)exp(F Q t) = h a cxp(F a t). 

Hence, the probability P{I a (t) £ £ *} is derived as 

P{I a {t) ££*} = \ir a {t)\ = h a exp(F a t)l. (18) 

By combining (17) and (18), the CDF H a (t)(= P{T a < t}) 
is obtained as 

H a (t) = 1 - P{T a >t} = l-h a exp(F Q t)l, 

which proves the first formula in Lemma 2. 

From basic Markov chain theory, the CDF H a (t) can be 
expressed as follows [17, Eq. (1)]: 

H a {t) = 1 - ^expCoi*)^), (19) 

i 

where pi denote nonzero eigenvalues of Q a . Since 
{Sfc Ik(t);t > 0} is a counting process, the infinitesimal 
generator Q a is an upper triangular matrix. Hence, all the 
nonzero eigenvalues of Q a come from the diagonal elements 
of the matrix F a , which are real and negative. Hence, (19) 
can be rewritten as 

H a (t) = i -J2 e M-\pi\t)Pi(t), 

i 

which proves the second formula in Lemma 2. 

Appendix E 
Proof of Lemma 3 

It is clear from the formulas in Lemma 2 that the func- 
tion H a (-) is strictly increasing. Hence, the (a, j3) -guaranteed 
time G a .p is uniquely determined by solving H a (G a .p) = (3. 
Since H a (-) is a bijective function, it has the inverse function 
i?~ 1 (-). Therefore, G a: p is obtained by G Q!( g = 7J~ 1 (/3). This 
proves (8). 

In our model, the Markov chain {I a (t); t > 0} is eventually 
absorbed into the absorbing state space £ ° with probability 1, 
which shows the existence of the inverse matrix of F a [24, 
Lemma 2.2.1.], [25, Theorem 2.4.3]. Under this condition, it 
is well-known that the nth moment of T a is given by (9) [24, 
Eq. (2.2.7)], [25, Eq. (2.13)], which completes the proof. 



-(iV-l)A* 








(N-1)X* 
-2(N -2)A* 








2(N - 
-3(N 



2)A* 
-3)A* 



(13) 



-2(7V-2)A* 2(N-2)A* 
-(iV-l)A*_ 
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Appendix F 
Proof of Theorem 1 

Suppose that the infection rate A^ ff & is scaled by 7 (> 0) 
times for all a, b. In this proof, we add a symbol ~ on top of 
any notation to distinguish it after the scale. Note that by (P4) 
of Lemma 1 and (7), we have F a = jF a . Hence, by Lemma 2 
we have for all t > 0: 



P{T a <t} = l-h a exp(F Q t)l 

= 1 - h a exp(F a {-ft))l 
= P{T a < It}. 



(20) 



That is, P{f a < t} = P{7 _1 T Q < t] for all t > 0. This 
proves (10). 

Let H a (t) = P{f a < t} be the CDF of f a . Then, (20) 
gives H a (t) = H a (rft), which yields ff" 1 ^) = 7~ 1 #a \t). 
By (8) in Lemma 3, we further have 

This proves G a ,p = j^ 1 G a .p. From (10), we have (T a ) n = 
j- n (T a ) n . By taking expectation, E[(f a ) n ] = j~ n E[(T a ) n }. 
This completes the proof. 




Fig. 8. Proof of (Tl) in Theorem 2: G a p t N > G a>( 3,jv+i for any /3 > 0* . 

(T4) E[T a ] =Q((X*)- 1 N- 1 logN). 

In the proof, we add the subscript N to the variables T a and 
G a> p to explicitly denote the assumed population size. 

Proof of (Tl ): When a = 1, we have F a = F, and accordingly 
all the eigenvalues of F a come from the diagonal elements of 
the matrix F. In addition, when K = 1, the diagonal elements 
of F can be obtained from (13) by pi = —i(N — i)X*,i 
1.2..... ] . Hence, by Lemma 2, we have 

LfJ 

P{T a:N > *} - £ cxp(-i{N - *)Vt)iV(*). (23) 



Appendix G 
Proof of M(t) = M(jt) and V(t) = jV(jt) 

Similarly to the proof of Theorem 1, we add a symbol "on 
top of any notation to distinguish it after the scale. Since the 
random variable ^2 k Ik(t) takes on only nonnegative integer 
values from to N, the expectation M(t) (= E[J2 k h{t)]) 
can be obtained by M(t) = EiIi p {Efc J fe(*) > 0- B Y 
Definition 1, the event {^2 k Ik(t) > i} is equivalent to 
{Ti/N < Hence, the expectation .M (f) is given by 



N 



M(t) = J2HTi/N <*}• 



(21) 



Similarly, for a network with N + 1 nodes, we have 

P{T a>JV+1 >t}=^ ex P H(iV+l-i)A*t)P i)JV+1 (t). (24) 

i=l 

It is straightforward to show that the ratio of (24) to (23) 
converges as 

lim P^-iv+i > *> = o. 
t^oo P{T a , N >t} 

Thus, there exists t* (> 0) such that P{T a ^ N+1 > t} < 
P{T a , N > t} for all t > t*. Let f3* = H a {t*). Then, as evident 
by Fig. 8, we have G at p t N > Ga.^.iV+i f° r an y P > P* ■ 



Similarly, the expectation M{t) after the scale is given by Proof of (T2): From Fig. 1, the expectation E[T aj jv] is obtained 

by E[T aiA r] = ^ Y,i=i i(N-i) ■ Similarly, for a network with 



M(t) = Y,HTi/N<t}. 



(22) 



By (10) in Theorem 1, the probability in (22) satisfies 
P{T l/N <t} = P{T l/N < -ft}. Thus, from (21) we have 



N + 1 nodes, we have E[T a , N+1 ] = jr J2i=i i(N+i-i) - B y 
using these formulas for E[T q ,at] and E[T Qi jv+i]i we can show 
that E[T Q j\r+i] — E[T Q j\r] < for all N = 1,2, ... as follows: 



E[T Q! jv+i] — E[T ai jv] 



N 



M(t) =J2HTi/N <lt} = M( 7 t). 



By using the relation A4(t) = M.{p/t), it is straightforward to 
show that V(t) (= f t M{t)) = jV{jt). 

Appendix H 
Proof of Theorem 2 

In this appendix, we will prove the followings in order: 

(Tl) G a ,p is strictly decreasing with N for sufficiently 
large (3. 

(T2) E[T a ] is strictly decreasing with N. 

(T3) G a , = e((A*)- 1 A- 1 (logA-log(logi))). 



1 / N 



N-l 

E 



^i(N + l-i) f^i{N-i) 



N-l 



— (— V — 

X* 1 N + 2^ AT . 



< 



X* \N 

1/1 



N - i \i + 1 



1 



N-l 



X*\N N-l \i+l 

i—1 



X* \N N 



0, 



(25) 



which proves (T2). 

Proof of (T3): To prove (T3), we need the following lemma. 
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Lemma 4. Let Zi (i = 1, . . . , n) be a sequence of independent 
exponential random variables with rates r\i. Then, the sum 
Z = Zi has the complementary cumulative distribution 

function ( CCDF) give by: 



P{Z>z} = J2(-V 



exp(— rjii 



Proof: It is well-known that the sum of n independent expo- 
nential random variables with rates (i = 1, . . . , n) follows 
the generalized Erlang distribution. When r.; ^ rj for all i ^ j, 
i.e., in our case = r\i, the CCDF of the generalized Erlang 
distribution is given by 



p{*>*>=E n 



i=l y j=l,j^i 



exp(-r l z). 



(26) 



Replacing Ti by r\i and simplifying (26) yield the lemma. ■ 

Suppose that N is an odd number. When N is an even 
number, we can use similar steps for the proof, and hence 
we only consider the case when N is odd. From Fig. 1, the 
a-completion time is obtained by 






Fig. 9. Transition diagram of the Markov chain {I(t); t > 0} when K = 1 
in the non-cooperative model. 



It remains to prove (29). By Lemma 4, the CCDF of Ti ower 
is given by 



JV-l 



p{T lower >t}= ]T(-iy 



-ifN-l 



Hence, P{Ti ower > t* ower } is simplified as 

JV-l 



P{^lower > *tower} — ( 1)* 



N - 1 



exp(-A*7V*zi). 



1 , 1 
log s 



N - 1 °/3 



JV-l 



i-i 



N- 1\ / log/3 



N- 1 



log/3 
iV-1 



JV-l 



(30) 



T a , N = Y^(Xi+Yi), 



(27) 



i=i 



where X, and 3^ are independent and identically distributed 
exponential random variables with rates i(N — i)X*. To give 



a bound on T n 



N- 



we introduce random variables T m 



E.^ 1 Zr and Ti 



= Y^=i z l where 3? and Z| are 
independent exponential random variables with rates X*Ni/4 
and X*Ni, respectively. For any two random variables A and 
B, we use A ^ B to denote P{A > x} < P{B > x} for all 
x G KL In the following, we show that 



Slower I_ ^ T u 



pper- 



(28) 



Since the rate of X{ is greater than that of Z^ , we have X, ^ 
Z%i for i = 1,2,..., In addition, since Z% d #2i-i 

by the same reason and = Yi, we have Yj ^ Z^i-i f° r 



1,2,. 



JV-l 



Therefore, we have Ei=i O^i + ^i) — 
Ei^ 1 • That i s > T a -< T upper . By using a similar approach 
as before, we can easily obtain 2i ower < T a . Due to similarity, 
we omit the details. 

Let 4 x | F {log(7V - 1) - log(logi)}, and ij, wer 4 

^upper- Then, we have the followings, which will be shown 



4 upper 



in the sequel: 



Jim P{Ti ower > £j* wer } = 1 - 13, 



N- 



Jim P{T upper > t* } = 1-/3 



(29) 



The results in (28) and (29) show that there exists N* 6 N 
such that 

Cer < G a ,0 lJV < t: pper for all N > N\ 
which gives G a ,p, N = e((A' t )- 1 Ar- 1 (logiV-log(logi))). 



By letting N go to oo, we have 

Jim P{Ti ower > if ower } = 1 - exp(log/3) = 1-/3, 

JV— >oo 

which proves the first equality in (29). Similarly as above, 
we can prove the second equality in (29) and omit detailed 
derivations. 

Proof of (T4): Suppose that N is an odd number. When TV is 
an even number, we can use similar steps for the proof, and 
hence we only consider the case when N is odd. From (27), 
the expectation of T Q; /v is given by 



A* ^ i(N-i) 



(31) 



For notational simplicity, we define a function f(x) = x ^_ x ^ 
for < x < N. Since /(•) is a strictly decreasing convex 
function, the finite series in (31) is bounded above as follows: 



E[T QiJV ] = A^/(i)<Aj/(i) + jf 2 /(x)dx 



Using basic calculus, we obtain /(l) + J" 2 /(a-)dx = 
£ log which gives E[T Q , W ] = 0((A*) _1 A^ _1 logiV). 

By the same reason as above, the finite series in (31) is 
bounded below as follows: 

N + l 

E[T Q , W ]>A^ 2 /(x )dx = ^log(iV + l), 

which gives E[T a , N ] = Q.((X*)~ 1 N~ 1 logiV). This completes 
the proof. 
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Appendix I 
Proof of Remark 1 

In the non-cooperative model, the number of infected nodes 
evolves as a CTMC with transition diagram depicted in Fig. 9. 
Let T° N denote the a-completion time in a network with TV 
non-cooperative nodes. As in the proof of Theorem 2, we use 
the subscript A to explicitly denote the assumed population 
size. From Fig. 9, we have 



N-l 



rpO \ ^ yO 

1 a,N — /^i » ' 



(32) 



where Z° are independent exponential random variables with 
rates X*i. From (32), it is clear that T° N+1 = T° N + Z° N , 
which gives 

P{T°. N >t}< P{T° w+1 > t} for all t > 0. 

Therefore, the guaranteed time Ga^.N is strictly increasing 
with A. From (32), it is also clear that E[T° jJV ] = ± J]^ 1 \, 
which is the (A — l)th partial sum of the harmonic series di- 
vided by A*. Hence, the average E[T° N ] is strictly increasing 
with A, and it scales as 6((A*) _1 log A). 
By Lemma 4, the CCDF of T° N is given by 



JV-l 



p{7 i Q %>o = E(- 1 )" 1 



A- 1 



cxp ( - \*it) 



Let t* = p-{log(A — 1) — log(logi)}. By using the same 
approach as in (30), we can derive 

?{T°, N > t*} = 1- ( 1 • 



A- 1 

Due to similarity, we omit the details. By letting A go to oo, 
we have lim^oc P{T° N > t*} = 1 - exp(log/3) = 1-/3, 
which proves G a ,p = 6((A*) _1 (log A - log(log i))). 

In the following, we will prove the results in the table. We 
first consider the cooperative model. Since T a jy is obtained 
by the sum of independent exponential random variables with 
rates i(N— 1)A* (i = 1,2, . . . , A — 1) (see Fig. 1), the variance 
of T a ^N, denoted by Var(T a ^), is derived as 



Var(T Q!J v) 



1 



JV-l 



1 



(A*) 2 ^ (i(N-i)) 2 
By using a similar approach as in (25), we have 
Vax(T a>N+1 ) - Var(T a ,jv) 

1/1 



(33) 



N-l 



_J_(J_ 1 / 1_ 1 

( A *)2 y N 2 + JL (JV-i) 2 \(i + l) 2 i 2 



< 



1 



1 



I N ~ 1 

zZ 



{X*) 2 \N 2 (A - l) 2 1)" 



1 



1 

2 ~p2 



1 



-2 



< 0, 



(A*) 2 A 2 (A - 1) 

which proves that the variance of T a jy under the coop- 
erative model is strictly increasing with A. To prove the 
order of T Q jv, we use the similar approach as in the proof 



of (T4) in Appendix H. By noting that (i) Var(T QjA r) 



(x(N-x)y 



■ E3T(/(*)) 2 for an odd A by (33), and (ii) (f(x)) 2 (4 
■) is a strictly decreasing convex function, we have 



Var(T Qi7V ) < 



(/(I)) 2 



(A*) k 

Using basic calculus, we obtain (/(l)) 2 + J. 



(f(x)) 2 dx 



t (f(x)fdx = 

w=w + w» ~ #"nF-i) + ifa lo s ^wrf' which s ives 

Var(T Qj Ar) = 0((A*A) -2 ). By the same reason as above, 
we have the following lower bound: 

N + l 

Var(T Qi7V ) > 



(A*) 2 
2 



A 2 



{f{x)fdx 

A-3 
' A 2 (A 2 - 1) 



which gives Var(T Q jv) = f2((A*A) -2 ). Hence, we have 
Var(T Q! jv) = 9((A*A)~ 2 ). Similarly as above, we can prove 
that the skewness of T a jq is also strictly increasing with 
A, and it scales as 8((A*A)~ 3 ). Due to similarity, we 
omit detailed derivations. In our model, the Markov chain 
{I a (t); t > 0} is eventually absorbed into the absorbing state 
space £° with probability 1, which shows the existence of the 
finite nth moment of T a> N [24, Eq. (2.2.7)]. 

We next consider the non-cooperative model. By indepen- 
dence of Z°, the variance of T° N is obtained by Var(T° N ) 

l 



(A* 



Eili 1 Hence, it is strictly decreasing with A and 



converges to (A*)~ 2 £(2) as A goes to oo. Similarly as above, 
we can prove that the skewness of T° N is strictly increasing 
with A and converges to (A*)~ 3 £(3) as A goes to oo. Due to 
similarity, we omit detailed derivations. Since E[T° N ] = oo, 
all the other ?ith moments for n = 2, 3 
as shown below: 

p oo 



are also divergent 



n 



f 



MP{T Q V < x} 



> J xd?{TZ, N <*}> E[T Q V] - 1 = oo. 

Appendix J 
Proof of Theorem 3 

Without loss of generality, we assume A^ 1 > A£ 2 - Then, by 
the condition (11) the infection rates can be written in terms 
of A* and 7 as follows: 

2 7 A* 



^2,2 



7+1' 
2A* 

7+1' 
AS 



(34) 



A*. 



By the second formula for H a (-) in Lemma 2, we have 
D a (l) = — maxi/0j(= niinj|/5j|). As shown in Lemma 2, 
diagonal elements of F a constitute pi, which are negative of 
transition rate from 12) to the set + 1, 12), (ii, 12 + 1)} 
for («i, 4*2) S £a- Hence, we can obtain D a (^) by solving the 
following maximization problem: 



A*(t) = - max p{h,h)> 



(35) 
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Fig. 10. Domain £* of the maximization problem in (35) when [a AT] — 2 < 
^ (left), [aJV] - 2 = % (middle), and \aN] - 2 > ^ (right). 



where 

p{k,i2) 
f* 

Note that 



£(£ 



fe=i 

7 2 



(36a) 



{(i 1 ,» 2 ) e z| 1 1 < n + i 2 < foJ\r| - l, 

1 < ii < iV/2, < i 2 < N/2}. (36b) 



d 2 p{ii,i 2 ) 
d 2 p(h,i2) 



2X* > 0, 



2XU > 0, 



d( l2 ) 2 

3(*i) 2 



0, 



which implies that the maximum of ^2) in (35) occurs 
at the vertex of the set £* in (36b) (see Fig. 10). Suppose 
\aN~\ — 2 < N/2. Then, from the figure, the vertices of £* 
are given by (1,0), (1, \aN~\ - 2), and (\aN] - 1,0), and 
thus we have 

max p{ii,i2) 

(il,i2)e£ J 

= max{p(l,0),p(l,raiVl-2),p(raJVl-l,0)} 
= p(l,0). 

Similarly as above, we can solve the maximization problem 
for each of the cases when \aN~\ - 2 = N/2 and \aN~\ - 2 > 
N/2. Since it is straightforward, we summarize the results 
without detailed derivations: 



max p(h,i2) 
(ii,i2)e£; 

>(1,0), 

' N 



«<l-f, 



p(f , \aN] - 1 - f ), a=l, 



P(1,0), 



1- £ <a<l, 7 <^EF, 



1 



), 1 



?-<a<l,7>^=^ 



AT 



JV-4 



From (34) and (36a), we have 



( IP W 



pO-,0) 

N 
~ ~2 



(N - 2) 7 A* NX* 

7+1 2~' 

f /v\* 

.(jv-rai\n+i)|-f- 

V -, iV\ 2A* 

\aN] - 1 



2/7 + I 

Therefore, ±p{l,0) < and ±p(%, \aN] - 1 - f ) > for 
all 7 > 1, which proves the theorem. 
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